skip to main content


Search for: All records

Creators/Authors contains: "Bhat, Harish S"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. When faced with severely imbalanced binary classification problems, we often train models on bootstrapped data in which the number of instances of each class occur in a more favorable ratio, often equal to one. We view algorithmic inequity through the lens of imbalanced classification: In order to balance the performance of a classifier across groups, we can bootstrap to achieve training sets that are balanced with respect to both labels and group identity. For an example problem with severe class imbalance—prediction of suicide death from administrative patient records—we illustrate how an equity‐directed bootstrap can bring test set sensitivities and specificities much closer to satisfying the equal odds criterion. In the context of naïve Bayes and logistic regression, we analyse the equity‐weighted bootstrap, demonstrating that it works by bringing odds ratios close to one, and linking it to methods involving intercept adjustment, thresholding, and weighting.

     
    more » « less
  2. null (Ed.)
  3. null (Ed.)
  4. We develop algorithms to automate discovery of stochastic dynamical system models from noisy, vector-valued time series. By discovery, we mean learning both a nonlinear drift vector field and a diagonal diffusion matrix for an Itô stochastic differential equation in Rd . We parameterize the vector field using tensor products of Hermite polynomials, enabling the model to capture highly nonlinear and/or coupled dynamics. We solve the resulting estimation problem using expectation maximization (EM). This involves two steps. We augment the data via diffusion bridge sampling, with the goal of producing time series observed at a higher frequency than the original data. With this augmented data, the resulting expected log likelihood maximization problem reduces to a least squares problem. We provide an open-source implementation of this algorithm. Through experiments on systems with dimensions one through eight, we show that this EM approach enables accurate estimation for multiple time series with possibly irregular observation times. We study how the EM method performs as a function of the amount of data augmentation, as well as the volume and noisiness of the data. 
    more » « less
  5. We consider the problem of learning density‐dependent molecular Hamiltonian matrices from time series of electron density matrices, all in the context of Hartree–Fock theory. Prior work developed a solution to this problem for small molecular systems with density and Hamiltonian matrices of size at most 6 × 6. Here, using a battery of techniques, we scale prior methods to larger molecular systems with, for example, 29 × 29 matrices. This includes systems that either have more electrons or are expressed in large basis sets such as 6‐311++G**. Scaling the method to larger systems enhances its relevance for realistic applications in chemistry and physics. To achieve this scaling, we apply dimensionality reduction, ridge regression and analytic computation of Hessians. Through the combination of these techniques, we are able to learn Hamiltonians by minimizing an objective function that encodes local propagation error. Importantly, these learned Hamiltonians can then be used to predict electron dynamics for thousands of steps: When we use our learned Hamiltonians to numerically solve the time‐dependent Hartree–Fock equation, we obtain predicted dynamics that are in close quantitative agreement with ground truth dynamics. This includes field‐off trajectories similar to the training data and field‐on trajectories outside of the training data.

     
    more » « less
  6. In this paper, we develop methods to find the most sparse perturbation to a given Markov chain (either discrete- or continuous-time) such that the perturbed Markov chain achieves a desired equilibrium. 
    more » « less